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The association between cancer and volatile organic metabolites in exhaled breaths has attracted increasing 
attention from researchers. The present study reports on a systematic study of gas profiles of metabolites in 
human exhaled breath by pattern recognition methods. Exhaled breath was collected from 85 patients with 
histologically confirmed breast disease (including 39 individuals with infiltrating ductal carcinoma, 25 
individuals with cyclomastopathy and from 21 individuals with mammary gland fibroma) and 45 healthy 
volunteers. Principal component analysis and partial least squares discriminant analysis were used to 
process the final data. The volatile organic metabolites exhibited significant differences between breast 
cancer and normal controls, breast cancer and cyclomastopathy, and breast cancer and mammary gland 
fibroma; 21,6, and 8 characteristic metabolites played decisive roles in sample classification, respectively (P 
< 0.05). Three volatile organic metabolites in the exhaled air, 2,5,6-trimethyloctane, 
l,4-dimethoxy-2,3-butanediol, and cyclohexanone, distinguished breast cancer patients from healthy 
individuals, mammary gland fibroma patients, and patients with cyclomastopathy (P < 0.05). The identified 
three volatile organic metabolites associated with breast cancer may serve as novel diagnostic biomarkers. 



Breast cancer is the most prevalent malignancy among women around the world and a major cause of female 
deaths. Each year, over 1.3 million women are diagnosed with breast cancer, and approximately 0.5 million 
women die of this disease'. In the United States, nearly 3 million women have a history of invasive breast 
cancer, and 226,870 new cases of breast cancer were reported in 2012^. Breast cancer is a progressive disease in 
which larger tumor size and the presence of lymph node metastasis are associated with worse prognoses'". An early 
breast cancer diagnosis can effectively reduce patient mortality rates'" \ Currently, the most common breast 
cancer screening method is mammography*'^. Many clinical trials have confirmed that mammography screening 
can significantly decrease mortality rates among breast cancer patients''''''^. However, mammography remains an 
imperfect test; in particular, one limitation of mammography screening is that it does not detect all breast tumors. 
In a randomized, controlled mammographic screening trial, mammographic sensitivity to detect breast cancer 
ranged from 71% to 96%^. Patients with dense breast tissue had even lower mammographic sensitivities, from 
48% to 70%=■^ 

Ultrasonography is more sensitive than mammography at detecting lesions in women with dense breasts"''". 
However, ultrasonography does not detect most microcalcifications, which are the typical findings in ductal 
carcinoma in situ. In fact, 75% of cancers missed by ultrasonography were ductal carcinoma in situ, and 25% were 
invasive carcinomas". In addition, the results of ultrasonography can vary widely, depending on the expertise of 
the technician'^'". 

Due to recent advances in analytical chemistry, the association between cancer and volatile organic metabolites 
in exhaled breaths has attracted increasing attention from researchers''*". Breath analysis is suitable for disease 
screening because it is non-invasive, rapid, and readily accepted by patients. Preliminary studies have confirmed 
that analyses of exhaled volatile organic metabolites can differentiate between breast cancer patients and healthy 
controls'^ '". Here, we report a systematic study of gas metabolite profiles in human exhaled breath using pattern 
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recognition methods. Volatile organic compounds from exhaled air 
in healthy individuals, breast cancer patients, and cyclomastopathy 
and mammary fibroma patients were used as profile defining vari- 
ables. Potential biomarkers of breast cancer, cyclomastopathy, and 
mammary fibroma were analyzed. 

Results 

Breast Cancer Patients versus Controls. A total of 434 metabolites 
were consistently detected in 50% of the samples from breast cancer 
patients and normal controls. While the two-dimensional PCA score 
plot displayed a good separation trend (Fig. 1 A), the OPLSDA score 
plot demonstrated separation between breast cancer patients and 
normal controls using one predictive component and three orthogo- 
nal components (R2X = 0.51; R2Y = 0.876; Q2 = 0.762; Fig. IB). 
Moreover, the R2 and Q2 values calculated from the permutated data 
were less than the original values in the validation plot, which 
confirmed the validity of the supervised model (Fig. IC). 

Breast Cancer versus Cyclomastopathy. A total of 406 metabolites 
were consistently detected in 50% of the breast cancer samples and 
the patients with mammary gland hyperplasia. While the two- 
dimensional PCA score plot demonstrated a good separation trend 
(Fig. 2A), the OPLSDA score plot demonstrated separation between 
breast cancer patients and patients with mammary gland hyperplasia 
when using one predictive component and three orthogonal 
components (R2X = 0.501; R2Y = 0.777; Q2 = 0.565; Fig. 2B). 
Moreover, all of the R2 and Q2 values calculated from the 
permutated data were less than the original value in the validation 
plot, which confirmed the validity of the supervised model (Fig. 2C). 

Breast Cancer versus Mammary Gland Fibroma. A total of 408 
metabolites were consistently detected in 50% of the breast cancer 
and mammary gland fibroma samples. While the two-dimensional 
PCA score plot demonstrated a trend for good separation (Fig. 3A), 
the OPLSDA score plot demonstrated separation between breast 
cancer and mammary gland fibroma using one predictive compo- 
nent and one orthogonal component (R2X = 0.225; R2Y = 0.686; Q2 
= 0.524; Fig. 3B). Moreover, all of the R2 and Q2 values calculated 



from the permutated data were less than the original value in the 
validation plot, which confirmed the validity of the supervised model 
(Fig. 3C). 

Potential Biomarkers. Among the significant metabolites identified 
using the VIP values in the OPLSDA model and the FDR values, 29 
differential metabolites were annotated using the NIST 1 1 database 
with a similarity threshold of 75% (Table 1). 

2-acetyl aminopropionic acid, methylacrylic acid, butyl acetate, 
benzocyclobutene, 4-hydroxybutanoic acid, 1,3,5,7-tetroxane, ethyl- 
ene carbonate, 2,2-dimethyl decane, 2,3,4-trimethylheptane, 5- 
methyl-3-hexanol, 5-butylnonane tetradecane, hexadecane, 2,3,6- 
trimethyloctane, benzenemethanol, alpha,alpha-dimethyl and 2,5- 
dimethylhexane-2,5-dihydroperoxide exhibited significantly lower 
levels in the group of individuals with breast cancer than in the group 
of healthy individuals (P < 0.05), and breast cancer patients had 
significantly higher levels of 2,5,6-trimethyloctane, 1,4-dimethoxy- 

2.3- butanediol, cyclohexanone, dimethylacetamide, and trans-2- 
butene oxide in their exhaled breath (P < 0.05). 

The exhaled air from the breast cancer group compared with the 
cyclomastopathy group revealed significant differences in six poten- 
tial biomarkers. The breast cancer group exhibited significantly 
lower levels of cyclooctanemethanol and trans-2-dodecen-l-ol (P 
< 0.05), but significantly higher levels of 2,5,6-trimethyloctane, 

1.4- dimethoxy-2,3-butanediol, butyl glycol, and cyclohexanone were 
found in the breast cancer group (P < 0.05). 

The comparison of exhaled air from the breast cancer group vs. the 
mammary gland fibroma group revealed significant differences in 
eight potential biomarkers. Cyclopentanone exhibited significantly 
lower levels in the breast cancer group (P < 0.05), which had 
significantly higher levels of 1,2-propanediol, cyclohexanone, butyl 
glycol, l,4-dimethoxy-2,3-butanediol, 2,5,6-trimethyloctane, 3,4,5,6- 
tetramethyloctane, and ethylanUine (P < 0.05). 

2,5,6-trimethyloctane, l,4-dimethoxy-2,3-butanediol, and cyclo- 
hexanone were increased significantly in breast cancer patients rela- 
tive to healthy individuals, mammary gland fibroma patients, or 
patients with cyclomastopathy (P < 0.05). Additional information 
is provided in Table L 
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Figure 1 | (A): PCA score plot. (B): OPLSDA score plot (one predictive component and three orthogonal components, R2X = 0.51; R2Y = 0.876; Q2 = 
0.762). (C): PLSDA validation plot Intercepts: R2 = (0.0, 0.306); Q2 = (0.0, -0.512). 
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Figure 2 | (A): PCA score plot. (B): OPLSDA score plot (one predictive component and three orthogonal components, R2X = 0.501; R2Y = 0.777; Q2 = 
0.565). (C): PLSDA validation plot intercepts: R2 = (0.0, 0.226); Q2 = (0.0, -0.239). 



Discussion 

Diseases of the breast are some of the most common types of dis- 
eases among women. In particular, breast fibroadenoma, mammary 
gland hyperplasia, and breast cancer are three major diseases of the 
breast that are challenging for clinicians to diagnose. Existing 
screening and diagnostic techniques remain unsatisfactory. For 



instance, the detection of serum markers not only exhibits poor 
specificity but also requires wounding patients, which could facil- 
itate the transmission of blood-borne infectious diseases. MRI 
examinations are expensive and require both state-of-the-art equip- 
ment and well-trained physicians with sophisticated technological 
knowledge and skills. 
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Figure 3 | (A): PCA score plot. (B): OPLSDA score plot (one predictive component and one orthogonal component; R2X = 0.225; R2Y = 0.686; Q2 = 
0.524). (C): PLSDA validation plot intercepts: R2 = (0.0, 0.26; Q2 = (0.0, -0.222). 
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Table 1 | Potential biomarkers 

Breast cancer compared with Breast cancer compared with Breast cancer compared with 
healthy controls cyclomastopathy mammary gland fibroma 



Potential biomarker 


RT 


p-value 


FC 


VIP 


p-value 


FC 


VIP 


p-value 


FC 


VIP 
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1 ,2-Propanediol 


3.04 














o.Job-U4 


7.59 


1 .53 


2-Acetyl Aminopropionic acid 


3.95 


2.33E-07 


-2.26 


1 .73 














Cyclopentanone 


4.28 














2.05E-02 


-1 .02 


1 .54 


Metnylacrylic acid 


4.29 


3.66E-06 


-6.69 


1 .5 1 














Butyl acetate 


4.29 


6.66E-07 


-6.23 


1 .73 














Trans-2-Butene Oxide 


5.53 


6.28E-07 


1 .09 


1 .75 














Uimethylacetamide 


5.54 


1 .04E-09 


1 .32 


2.1 8 
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Benzocyclobutene 


6.06 


1 .06E-1 4 


-8.5 1 


2.02 














Cyelohexanone 


6. 14 


O OOF V /\ 

2.22E-10 


7.87 


2. 76 


3.30E-08 


2.S8 


2. 7 


2.36E-07 


2.76 


2,37 


Butyl Glycol 


6.35 








1 .03E-05 


0.75 


1 .78 


2.17E-07 


1.16 


2.45 


4-Hydroxybutanoic acid 


6.46 


6.69E-09 


-2.1 2 


2.18 














1 ,3,5,7-Tetroxane 


6.59 


3.55E-1 2 


-4.68 


1 .95 














ethylene carbonate 


7.56 


9.51 E-06 


-2.1 


1 .78 














1 ,4-Dimethoxy-2,3-butaneaiol 


7.64 


5.60E-08 


7.79 


2,03 


1.09E-05 


3.06 


7.79 


1.6SE-06 


6.25 


2,01 


2,5,6- Trimethyloctane 


7.64 


2.97E-09 


7.67 


2.07 


9.99E-07 


7.67 


7.9 


2.37E-06 


2.4 7 


1,84 
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2,2-Uimethyl Decane 


8.36 


1 . 1 7E-09 


-0.95 


1 .61 














J,4,o,6-letramethyloctane 


8.37 














2.32E-04 


0.87 


1 .51 


2, J,4-l rimetnylneptane 


1 0.57 


4.55E-1 2 


-9.48 


1 .86 














5-Methyl-3-hexanol 


10.57 


9.22E-13 


-1 1.42 


1.64 














5-Butylnonane 


10.58 


6.78E-1 1 


-6.72 


1.71 














2,3,6-Trimethyloctane 


10.58 


4.75E-09 


-3.53 


1.97 














Benzenemethanol, alpha, alpha-dimethyl 


10.58 


7.86E-09 


-2.4 


1.64 














Ethylaniline 


11. 5 














7.91 E-03 


2.45 


1.65 


Cyclooctanemethanol 


1 1.81 








6.27E-04 


-0.58 


1.5 








trans-2-Dodecen-l -ol 


12.5 








6.79E-05 


-0.62 


1.67 








2,5-Dimethylhexane-2,5-dihydroperoxide 


16.2 


1.94E-14 


-13.37 


1.92 














Tetradecane 


20.57 


6.22E-07 


-4.14 


1.79 














Hexadecane 


20.58 


1 .08E-06 


-4.56 


1.78 















Abbreviations: RT, retention time; VIP, variable importance in the projection; FC, fold change, defined as: FC=log2(Xl /X2), while XI denoted the arithmetic mean value of certain metabolite in the cose 
group and X2 denoted the arithmetic mean value in the control group. FC v/ith a positive value indicates that the concentration of certain metabolite is up-regulated in breast cancer compared with healthy 
controls, cyclomastopathy or mammary gland fibroma. FC with a negative value indicates that the concentration of certain metabolite is down-regulated in breast cancer compared with healthy controls, 
cyclomastopathy, or mammary gland fibroma. 



In recent years, several studies have confirmed that certain specific 
volatile metabohtes are present in abnormally high concentrations in 
the exhalations of breast cancer patients, and the origin of these 
compounds has been analyzed'""'^"'^'. Peng reported that exhaled 
breaths from breast cancer patients and healthy controls exhibited 
significantly different levels of five volatile compounds: 3,3-dimethyl 
pentane; 2-amino-5-isopropyl-8-methyl-l-azulenecarbonitrile; 5- 
(2-methylpropyl)nonane; 2,3,4-trimethyl decane; and 6-ethyl-3- 
octyl ester 2-trifluoromethyl benzoic acid'^. Phillips et al. suggested 
that alkanes and methylated alkane derivatives could be utilized as 
specific volatile markers for breast cancer; in particular, the research- 
ers proposed that these compounds were produced because the mito- 
chondrial release of reactive oxygen species (ROS) created oxidative 
stress that resulted in lipid peroxidation of polyunsaturated fatty 
acids in the cell membrane^". Hietanen et al. observed significantly 
higher than normal pentane concentrations in exhaled breath sam- 
ples from breast cancer patients, and they conjectured that this dif- 
ference originated from the peroxidation of fatty acids in the cell 
membrane^'. 

Previous studies that have addressed the use of exhaled breath 
analysis in the context of breast disease have had a few shortcom- 
jjjggi5jo,2i pij-ji;^ these studies have been limited to comparisons 
between cancer patients and healthy individuals; thus, the results 
of these investigations can be used to discriminate between these 
two groups, but are not particularly helpful for the screening and 
differential diagnosis of breast diseases. Second, various complex 
pathological types of breast cancer are observed in patients, and these 
studies do not distinguish among these different pathological types; 
however, it is known that different pathological types of cancer cells 
can generate unique volatile metabolites"'^"'''. 



By comparing these three sets of experimental results, we revealed 
that three potential biomarkers, 2,5,6-trimethyloctane, 1,4- 
dimethoxy-2,3-butanediol, and cyelohexanone, were dramatically 
more concentrated in the exhaled air from breast cancer patients 
relative to the exhaled air from healthy individuals, mammary gland 
fibroma patients, or patients with cyclomastopathy. We therefore 
concluded that these three chemicals were relatively specific for 
breast cancer. Although butyl glycol levels were similar in the exhaled 
air from breast cancer patients and healthy individuals, significantly 
higher butyl glycol levels were detected in the exhaled air of breast 
cancer patients than in the exhaled air of breast fibroma patients or 
patients with cyclomastopathy. We speculate that this discrepancy 
reflects external contamination because we would otherwise have 
expected to observe a difference between the breast cancer patients 
and healthy individuals. Butyl glycol is commonly used in paint and 
ink solvents, metal cleaning agents, and dye dispersants. Each of the 
remaining examined metabolites exhibited specificity for a particular 
participant group, without crossover between these groups. Thus, the 
data from each group could potentially be used to construct specific 
metabolite models for breast cancer, mammary gland fibroma, and 
cyclomastopathy that could be used for the clinical screening of these 
breast diseases. 

Most of the volatile markers identified in this study are alkanes, 
ketones, aldehydes, alcohols, or olefins. The mechanisms through 
which these metabolites are generated continue to be explored, and 
no unified consensus has been reached. However, the majority of 
relevant experimental results support the idea that these compounds 
result from oxidative stress^"*. Tumor tissues are characterized by 
vigorous growth and high energy demands. The malignant growth 
processes of cancer cells can lead to gene mutations and protein 
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Table 2 | Demographic characteristics of study subjects 


IDC 


CMP 


MGF 


Normal 


Subjects (n) 39 
Age (mean ± SD) 53.4 ± 10.6 
Smokers (n) 2 


25 

41.8 ± 8.5 
1 


21 

36.4 ± 8.0 

0 


45 

45.1 ± 10.7 
0 


Abbreviations: IDC, infiltrating ductal cancer; CMP, cyclomastopathy; MGF, mammary ( 


land fibroma. 







expression abnormalities. As a result, polyunsaturated fatty acids in 
cell membranes are subject to excessive oxidation, and an individual 
may produce excessive ROS™. In addition, reductions in other che- 
micals may result from the consumption of these substances by 
tumor cells^^. The specific biological mechanisms that generate vol- 
atile metabolites remain under investigation. Phillips has hypothe- 
sized that the volatile biomarkers of breast cancer relate to changes in 
estrogen metabolism mechanisms". This notion is supported by the 
fact that estrogen stimulation can stimulate the proliferation of both 
normal and neoplastic mammary epithelial cells^*". Investigations 
have also revealed that abnormally high aromatase expression occurs 
in breast cancer tissues^^'^". Aromatase (estrogen synthetase) is a 
component of the cytochrome P450 (CYP) enzyme complex that 
converts CI 9 androgen to CI 8 estrogen, thereby increasing estrogen 
generation'^'. Other P450 enzymes are also activated in breast cancer 
tissues, such as CYPlAl, CYPIBI and CYP3A4*. P450 can induce a 
wide variety of biological response, including promoting the bio- 
transformation of alkanes, alkenes, and aromatic compounds'". 
The metabolism of normal body cells can generate certain volatile 
metabolites, such as alkanes, that are produced as a result of oxidative 
stress'^. Given the multitude of P450 functions, the elevated activity 
of this enzyme in breast cancer tissues may markedly alter the com- 
ponents of exhaled air. Phillips has suggested that breast diseases are 
associated with increases in oxidative stress and elevated cytochrome 
P450 activity". 

Estrogen is a common precipitating factor of breast diseases, and 
changes in estrogen metabolism mechanisms play a crucial role in 
the carcinogenic processes of breast cancers. Spink reported that 
cytochrome P4501B1 catalyzes the conversion of estrogen to 4- 
hydroxyestradiol (4-OH-E2), which is a main driver of carcinogenic 
mechanisms in breast cancer tissues'". Liehr et al. demonstrated that 
the ratio of 4-OH-E2 to 2-hydroxyestradiol (2-OH-E2) is signifi- 
cantly higher in breast cancer tissue than in normal breast tissue'''. 
Estrogen metabolites can promote carcinogenesis by damaging cel- 
lular macromolecules and promoting the proliferation of injured 
cells through receptor-mediated processes'"''. Changes in estrogen 
metabolism can generate certain distinct volatile substances, which 
may be associated with the chemicals that we observed at elevated 
levels in the exhaled air from breast cancer patients. However, the 
detailed mechanisms of these associations must be elucidated 
through additional clinical research. Furthermore, our observations 
that the concentrations of certain volatile compounds were at lower 
than normal levels in the exhaled air from breast cancer patients may 
relate to cellular consumption and increases in P450 activity. There 
are certain limitations associated with this study. First, the impact of 
subject age was not addressed. The individuals in the breast cancer, 
mammary gland fibroma, and cyclomastopathy groups were 
approximately 30, 40, and 50 years of age, respectively. Existing 
studies'^ have demonstrated that increased age is associated with 
elevated oxidative stress levels, and the resulting oxidation products 
may damage proteins, DNA, lipids, and other biological macromo- 
lecules, thereby generating certain volatile metabolites that may 
affect the results of this experiment. Second, although the breast 
cancer patients suffered from the same pathological type of breast 
cancer, there was no attempt to differentiate among breast cancers of 
different grades in this experiment. The presence of tumors of dif- 
ferent clinical grades may affect the experimental results. In future 



studies, we will expand our sample size and differentiate among 
pathological types and stages of tumors in greater detail, allowing 
for more accurate volatile biomarkers to be identified. 

Breast cancer, cyclomastopathy, and breast fibroma exhibit spe- 
cific metabolic profiles with respect to volatile metabolites. Three 
volatile organic metabolites(2,5,6-trimethyloctane, 1,4-dimethoxy- 
2,3-butanediol, and cyclohexanone) associated with breast cancer 
may serve as novel diagnostic biomarkers. 

Methods 

Human Subjects. The present experiments were conducted in accordance with the 
Declaration of Helsinki. The protocol in this study was approved by the Ethics 
Committee at Harbin Medical University {No.201314), and written informed consent 
was obtained from patients prior to study enrollment. This study was conducted 
between May 2011 and April 2012 at the Department of Anesthesiology at the First 
Affiliated Hospital of Harbin Medical University. 

Included in this study were women between 25 and 80 years of age identified as 
ASA I and II individuals and scheduled for breast surgery. The following exclusion 
criteria were used: 1 ) currently breastfeeding, pregnant, or the possibility of becoming 
pregnant; 2) a diagnosis of a known congenital disease; 3) radiotherapy or chemo- 
therapy treatment prior to testing or another diagnosed malignant cancer at the time 
of testing; 4) co-existent chronic obstructive lung disease, asthma, or pulmonary 
tuberculosis and other pulmonary diseases; 5) the presence of a chronic inflammatory 
disease; and 6) the manifestation of any acute disease symptoms during the preceding 
two weeks. Moreover, to ensure uniformity of the experimental results by minimizing 
the impact of diet and environment on the composition of the subjects' exhaled 
breaths, study participants were asked to strictly fast for 8 hours (h) prior to breath 
sample collection. In addition to the group of breast cancer patients, this study also 
examined healthy female volunteers. The inclusion criterion for these individuals was 
the absence of a history of malignancies or infectious diseases. This study involved a 
total of 85 patients with histologically confirmed cases of breast disease {including 39 
individuals with infiltrating ductal cancer, 21 individuals with mammary gland 
fibroma, and 25 individuals with cyclomastopathy) and 45 healthy volunteers (who 
were negative for breast cancer by mammography and ultrasound examination). The 
demographic characteristics are summarized in Table 2. 

Breath Collection. Breath sampling and the parallel collection of ambient air were 
performed within 24 h after overnight fasting. Alveolar breath sampling was 
performed as previously described'"''''^. Briefly, 20 ml of exhaled gas were drawn into a 
gas-tight syringe (50 ml; Agilent Inc., USA). These samples were transferred 
immediately into evacuated 20 ml glass vials (Supelco Inc., USA). All vials were 
flushed thoroughly and cleaned with nitrogen gas (purity of 99.999%, Liming gas Inc., 
China) before being evacuated for breath sampling to remove any residual 
contaminants"**. All samples were analyzed within 3 h post- sampling. 

Solid-Phase Microextraction (SPME). A manual SPME holder with carboxen/ 
polydimethylsiloxane (CAR/PDMS) fibers of 75 [im thickness was purchased from 
Supelco (Bellefonte, USA). The SPME fiber was inserted into the vial and exposed to 
the gaseous sample for 20 min at 40' C. Subsequently, the desorption of volatiles 
occurred in the hot GC injector at 200^C for 2 min. 

Gas Chromatography-Mass Spectrometry (GC/MS) Analysis. Analysis was 
performed on a GC/MS (Shimadzu GC-MS QP 2010, Shimadzu, lapan) equipped 
with a DB-5MS (length 30 m X ID 0.250 X fihn thickness 0.25 [im; Agilent 
Technologies, USA) plot column. Injections were performed in the splitless mode, 
and the splitless time was 1 min. The temperature of injector was 200"C. The flow rate 
of the helium (99.999%) carrier gas was kept constant at 2 ml min"\ The column 
temperature was held at 40' C for 2 min to concentrate the hydrocarbons at the head 
of the column and then increased by 70 "C min"^ to 200' C for 1 min, then ramped 
20' C min"* to 230'^C for 3 min. The MS analyses were performed in full-scan mode, 
using a scan range from 35-200 amu. The ion source was maintained at 200''C, and an 
ionization energy of 70 eV was used for each measurement. 

Extraction and Pretreatment of the GC/MS Raw Data. Raw GC/MS data were 
converted into CDF format (NetCDF) fUes using Shimadzu GCMS Postrun Analysis 
software and subsequently processed using the XCMS toolbox (http://metlin.scripps. 
edu/download/). The XCMS parameters consisted of the default settings with the 
following exceptions: xcmsSet (fwhm — 8, snthresh — 6, max — 200); retcor (method 
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= "linear," family — "gaussian," plottype — "mdevden"); and a bandwidth of eight 
for first grouping command and four for the second grouping command^^-*". The data 
set of the aligned mass ions was exported from XCMS and could be further processed 
using Microsoft Excel to normalize the data prior to multivariate analyses. 

Statistical Analyses. The normalized data were exported to SIMCA-p 11.5 for 
principal component analysis (PCA), partial least-squares discriminant analysis 
(PLSDA), and orthogonal partial least-squares discriminant analysis (OPLSDA). To 
prevent overfitting, the default seven-round cross-validation in the SIMCA-p 
software was applied, and permutation tests using 100 iterations was performed to 
further validate the supervised model. Additionally, the nonparametric Kruskal- 
WaUis rank sum test was performed for each metabolite, and the corresponding false 
discovery rate (FDR) based on p-values was calculated to correct for multiple 
comparisons. The potential metabolic biomarkers were selected based on variable 
importance in the projection (VIP) values calculated from the OPLSDA model and 
FDR values of 1.5 and 0.05. 

Abbreviations. VOCs, volatile organic compounds; CAR/PDMS, carboxen/ 
polydimethylsiloxane; SPME, solid-phase microextr action; GC/MS, gas 
chromatography- mass spectrometry; PCA, principal component analysis; PLSDA, 
partial least-squares discriminant analysis; FDR, false discovery rate; VIP, variable 
importance in the projection; CYP, component of the cytochrome P450; 4-OH-E2,4- 
hydroxyestradiol; 2-OH-E2, 2-hydroxyestradiol. 
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